Skip to content

Conversation

@Tofel
Copy link
Contributor

@Tofel Tofel commented Jun 6, 2025

Succesful run: https://github.com/smartcontractkit/chainlink/actions/runs/15484446872/job/43596302667?pr=18037

Below is a summarization created by an LLM (gpt-4-0125-preview). Be mindful of hallucinations and verify accuracy.

Why

The changes improve the robustness and error handling of container start-up processes across various components (blockchain nodes, CL nodes, JD, and PostgreSQL) by introducing retries. This is particularly useful in environments where transient issues may temporarily prevent a container from starting successfully.

What

  • framework/components/blockchain/containers.go
    • Updated container start method to framework.StartContainerWithRetry for EVM containers. This change improves error handling by retrying container start-up, reducing the likelihood of failures due to transient issues.
  • framework/components/clnode/clnode.go
    • Changed the method for starting containers to framework.StartContainerWithRetry. This modification ensures better stability by attempting to start the container multiple times in case of initial failure.
  • framework/components/jd/jd.go
    • Updated container start method to framework.StartContainerWithRetry. Enhances reliability by trying to start the container multiple times if the first attempt fails.
  • framework/components/postgres/postgres.go
    • Changed the PostgreSQL container start method to framework.StartContainerWithRetry. This adjustment aims to increase success rates of container starts by implementing a retry mechanism.
  • framework/docker.go
    • Added StartContainerWithRetry function along with NaiveRetrier and helper functions. These additions provide the capability to retry starting containers with a simple retry strategy, thereby enhancing the resilience of the container start-up process.

Below is a summarization created by an LLM (gpt-4-0125-preview). Be mindful of hallucinations and verify accuracy.

Why

The primary goal of these changes is to enhance the stability and reliability of container initialization in integration tests. By incorporating a retry mechanism when starting containers, we address transient issues that may occur due to various reasons such as network delays or resource constraints, ensuring a more robust setup for testing environments.

What

  • framework/components/blockchain/containers.go
    • Replaced testcontainers.GenericContainer call with framework.StartContainerWithRetry to attempt container start with retries.
  • framework/components/clnode/clnode.go
    • Replaced tc.GenericContainer call with framework.StartContainerWithRetry for retrying container starts.
  • framework/components/jd/jd.go
    • Replaced tc.GenericContainer call with framework.StartContainerWithRetry adding retry logic when starting containers.
  • framework/components/postgres/postgres.go
    • Replaced testcontainers.GenericContainer with framework.StartContainerWithRetry to implement retry functionality on container start.
  • framework/docker.go
    • Added NaiveRetrier function which attempts to restart the container with a simple retry logic.
    • Implemented StartContainerWithRetry function that tries to start a container with up to 3 retries using provided retriers or a default one.
    • Added removeContainer function to support retry logic by removing existing containers before retrying.

@Tofel Tofel marked this pull request as ready for review June 6, 2025 08:32
@Tofel Tofel requested a review from a team as a code owner June 6, 2025 08:32
@Tofel Tofel requested a review from skudasov June 6, 2025 08:36
Copy link
Contributor

@skudasov skudasov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use static ports so we can then apply chaos and break anything without exposing testcontainers-go layer to the test.
Linux systems typically use 32768-60999 for temporary ports, this issue was caused because JD was exposing 42424, maybe we can change the mapping to 125XX for JobDistributor and have this service both testable and without conflicts?
We run strictly one environment per GHA runner, so mapping to 125XX should fix it.

@Tofel
Copy link
Contributor Author

Tofel commented Jun 6, 2025

@Tofel Tofel closed this Jun 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants